Personnel
Overall Objectives
Research Program
Application Domains
Highlights of the Year
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: New Results

Analyzing and Reasoning on Heterogeneous Semantic Graphs

Distributed Artificial Intelligence for Linked Reviewable Data Management on the Semantic Web

Participants : Ahmed El Amine Djebri, Andrea Tettamanzi.

The aim of this PhD thesis that started this year is to study and to propose original solutions to many key aspects: Knowledge Representation in case of uncertain, incomplete and reviewable data; Uncertainty Representation in a data source, with provenance; Distributed Knowledge Revision and Propagation; Reasoning over Uncertain, Incomplete and distributed data-sources. Starting from an open Web of Data, this work tries to give the users more objective, exhaustive and certain views and information about their queries, based on distributed data sources with different levels of certainty and trustworthiness.

Uncertainty Management

Participant : Andrea Tettamanzi.

In collaboration with Didier Dubois and Henri Prade of IRIT, Toulouse, and Giovanni Fusco, a geographer of the ESPACE CNRS UNS laboratory, we have developed a theory of uncertain logical gates in possibilistic networks and we have illustrated their application on a problem of human geography [17].

With Giovanni Fusco, we have approached the problem of selecting among various models of a phenomenon, in the form of Bayesian networks, and we have illustrated our solution by applying it to the case of urban sprawl [35].

Finally, together with Didier Dubois and Henri Prade of IRIT, Toulouse, and Célia da Costa Pereira of I3S, we have proposed a possibilistic approach to handle topical metadata about the validity and completeness of information coming from multiple source, with the aim of aggregating it in a possibilistic belief base [45].

Mining the Semantic Web for OWL Axioms

Participants : Thu Huong Nguyen, Andrea Tettamanzi.

The aim of the research in this PhD thesis is to learn OWL2 ontologies from RDF data in an open world. The first task is the extraction of axioms based ontology learning techniques using the evolutionary algorithm. Then, developing the methods for OWL axioms evaluation. In the half of the year, research work has been concentrated on the Grammatical Evolution algorithm to explore the set of disjointed classes axioms in search for the ones that are best suited to describe the recorded RDF data.

Logical Foundations of Cognitive Agents

Participant : Andrea Tettamanzi.

Together with Célia da Costa Pereira of I3S, Beishui Liao of Zheijiang University, China, Alessandra Malerba and Antonino Rotolo of the University of Bologna, Italy, and Leendert van der Torre of the University of Luxenbourg we have proposed a computational model for legal interpretation based on fuzzy logic and argumentation, which has been presented at the 16th International Conference on Artificial Intelligence and Law [46].

Agent-Based Recommender Systems

Participants : Amel Ben Othmane, Nhan Le Thanh, Andrea Tettamanzi, Serena Villata.

We have proposed a spatio-temporal extension for our multi-context framework for agent-based recommender systems (CARS) [27], to which we have then added representation and algorithms to manage uncertainty, imprecision, and approximate reasoning: a paper describing this latter development has been accepted at the 10th International Conference on Agents and Artificial Intelligence (ICAART 2018), which will be held in Madeira on January 16–18, 2018.

RDF Mining

Participants : Catherine Faron Zucker, Fabien Gandon, Andrea Tettamanzi, Tran Duc Minh.

In collaboration with Claudia d'Amato of the University of Bari, we have carried on our investigation about extracting knowledge from RDF data, by refining our evolutionary approach to discover multi-relational rules from ontological knowledge bases exploiting the services of an OWL reasoner [42], which we have called EDMAR. In addition, we have finally developed a coherent and organic theory of possibilistic testing of OWL axioms against RDF data [22]. The intuition behind it is to evaluate the credibility of OWL 2 axioms based on the evidence available in the form of a set of facts contained in a chosen RDF dataset.

Argument Mining

Participants : Serena Villata, Elena Cabrio, Fabien Gandon, Mihai Dusmanu.

We have presented an argument mining approach which applies supervised classification to identify arguments on Twitter. Moreover, we present two new tasks for argument mining, namely facts recognition and source identification. We study the feasibility of the approaches proposed to address these tasks on a set of tweets related to the Grexit and Brexit news topics. The results of this research have been published at the EMNLP 2017 conference [51].

In this direction, we have also, in collaboration with the Heron Lab of the University of Montreal, presented an empirical study about the impact of emotions and mental states on the argumentation people address in online debates. The results of this research have been published on the Argument & Computation Journal [24]. Another empirical experiment with humans has been addressed to study the impact of the three persuasive argumentation strategies called Ethos, Logos and Pathos, on the emotions and mental states of debaters. The results of this research have been published at the HCI 2017 conference [28].

Moreover, Serena Villata has co-authored a vulgarization paper for the AI Magazine, about computational argumentation [16].

Finally, Serena Villata, together with Matthias Thimm (Universitat Koblenz-Landau), has reported and analyzed the results of the first Computational Argumentation Challenge (ICCMA) in a Artificial Intelligence Journal [23].

Mining Legal Documents

Participants : Serena Villata, Cristian Cardellino, Milagro Teruel, Laura Alonso Alemany.

We have proposed a Named Entity Recognizer, Classifier and Linker for the legal domain. More precisely, we try to improve Information Extraction in legal texts by creating a legal Named Entity Recognizer, Classifier and Linker. With this tool, it is possible to identify relevant parts of texts and connect them to a structured knowledge representation, the LKIF ontology. This tool has been developed with relatively little effort, by mapping the LKIF ontology to the YAGO ontology and through it, taking advantage of the mentions of entities in Wikipedia. These mentions are used as manually annotated examples to train the Named Entity Recognizer, Classifier and Linker. We have evaluated the approach on holdout texts from Wikipedia and also on a small sample of judgments of the European Court of Human Rights, resulting in a very good performance, i.e., around 80% F-measure for different levels of granularity. The results of this research have been published at the EACL 2017 conference [30], the FLAIRS 2017 conference [29], and the ICAIL 2017 conference [31], A poster paper has been published at ISWC 2017 [64]. This research is addressed in the context of the EU H2020 MIREL project. The ICAIL paper has been awarded as “Best Innovative Paper” of the conference.

Cognitive Agent-Based Modeling

Participant : Andrea Tettamanzi.

Within the framework of the multi-disciplinary Franco-Colombian TOMSA research project, in collaboration with researchers of the I3S and ESPACE CNRS UNS laboratory and of the University of the Andes, we have developed a novel agent-based modeling approach based on belief-desire-intention (BDI) agents and we have demonstrated its potential by applying it to the coupled modeling of urban segregation and growth [43].

Robots Autonomously Learning about Objects

Participants : Valerio Basile, Elena Cabrio, Roque Lopez Condori.

Autonomous robots that are to assist humans in their daily lives must recognize and understand the meaning of objects in their environment. However, the open nature of the world means robots must be able to learn and extend their knowledge about previously unknown objects on-line. In this third year of the project, we have investigated the problem of unknown object hypotheses generation, and employed a semantic Web-mining framework along with deep-learning-based object detectors. This allows us to make use of both visual and semantic features in combined hypotheses generation. We have experimented on data from mobile robots in real world application deployments, showing that this combination improves performance over the use of the methods in isolation.

Moreover, we have built DeKO, a large-scale RDF repository of prototypical knowledge about objects (http://deko.inria.fr/). This version of DeKO provides, mainly, information about locations and typical usage of objects (e.g. Telephone LocatedAt Office, Spoon usedFor Eating). In addition, DeKO also provides an RDF explorer where users can find knowledge about objects navigating through their relations. DeKO was built by parsing natural language text with KNEWS [67] and using Distributional Semantics [68].

Frame clustering is an important module inside DeKO, since it could allow us to find representative frame instances, i.e. prototypical knowledge about objects. For frame clustering in DeKO, we followed a hierarchical clustering approach motivated mainly by two reasons: i) it does not require a pre-specified number of clusters and ii) most of these algorithms are deterministic. However, hierarchical clustering is expensive in terms of time, making it too slow for large data sets. In order to solve this problem, we applied a parallelization strategy using a map-reduce approach together with some heuristics in the preprocessing phase (e.g. filtering of frame instances). Currently, we are setting the server environments to perform the experiments over all the collection of DeKO. The following paper have been published on the topic: [44].

Event Identification and Classification in Short Messages

Participants : Amosse Edouard, Elena Cabrio, Nhan Le Thanh.

This work investigates the potential of exploiting information from the Linked Open Data KBs to detect, classify and track events on social media, in particular Twitter. More specifically, we address 3 research questions: i) How to extract and classify messages related to events? ii) How to cluster events into fine-grained categories? and 3) Given an event, to what extent user-generated contents on social medias can contribute in the creation of a timeline of sub-events? We provide methods that rely on Linked Open Data KBs to enrich the context of social media content; we show that supervised models can achieve good generalisation capabilities through semantic linking, thus mitigating overfitting; we rely on graph theory to model the relationships between named entities and the other terms in tweets in order to cluster fine-grained events. Finally, we use domain ontologies and local gazetteers to identify relationships between actors involved in the same event, to create a timeline of sub-events. We show that enriching the named entities in the text with information provided by LOD KBs improves the performance of both supervised and unsupervised machine learning models.

The following papers have been published on the topic: [33], [34], [18].

NLP over Song Lyrics

Participants : Michael Fell, Elena Cabrio, Fabien Gandon.

The goal of the WASABI project is to jointly use information extraction algorithms and the Semantic Web formalisms to produce consistent musical knowledge bases. Then, Web Audio technologies are applied to explore them in depth. More specifically, textual data such as song lyrics or free text related to the songs will be used as sources to extract implicit data (such as the topics of the song, the places, people, events, dates involved, or even the conveyed emotions) using Natural Language Processing (NLP) algorithms. Jointly exploiting such knowledge, together with information contained in the audio signal, can improve the automatic extraction of musical information, including for instance the tempo, the presence and characterization of the voice, musical emotions, identify plagiarism, or even facilitate the music unmixing.

Work in the first half year has been focused on two points. First, we delivered a report on the existing literature on NLP of song lyrics. Second, our endeavors of the last months are research on the estimation of the structure of song texts.

The following paper has been published on the topic: [55].

Conversational Agent Assistant

Participants : Raphaël Gazzotti, Catherine Faron Zucker, Fabien Gandon.

This CIFRE PhD thesis is performed in collaboration with SynchroNext, a company located in Nice. As part of this thesis, we are interested in setting up an ECA (Embodied Conversational Agents) for FAQs to advisers. The ECA will need to integrate a question and answer system to address the most common issue types without human intervention. For this purpose, it must be able to understand the questions asked in natural language by the users and to reason with the knowledge acquired. Beyond such a system of questions and answers, the ECA must be able to reopen the conversation with the Internet user according to the nature of his requests or the sequence of questions formulated. The objective is to reduce the dropout rate of Internet users on FAQs and to reduce the number of incoming calls and e-mails. This will enable customer advisers to focus on more difficult questions.

We considered the different questions from customers as a multi-label unbalanced classification problem. In order to improve the results of the categorization, we were interested in increasing vectors with domain specific knowledge and named entities, then by reducing these vectors with feature selection. Also, we look after the tuning of hyperparameters with Bayesian optimization.

HealthPredict

Participants : David Darmon, Catherine Faron Zucker, Virginie Lacroix-Hugues, Fabien Gandon, Raphaël Gazzotti.

This project is performed in collaboration with the Département d'Enseignement de Recherche en Médecine Générale (DERMG) at UNS and SynchroNext, a company located in Nice. HealthPredict is a digital health solution aimed at the early management of patients through consultation with their general practitioner and healthcare circuit. Concretely, it is a predictive artificial intelligence interface that allows to cross the data of symptoms, diagnosis and medical treatments of the population in real time to make a more accurate prognosis, choose a more adapted treatment and reduce side effects.